J_>HltOZ*-l UU1V1CU 



rage i 01 J 



EXHIBIT A 




PubMed 

Search 



Nucleotide 



Protein 



Genome 



Structure 



PMC 



PubMed 



If for) 



National 
library 
of Medicine 

Taxonomy 



NL.V1 



OMIM 



Be 



About Entrez 



Text Version 

Entrez PubMed 
Overview 
Help | FAQ 
Tutorial 

New/Noteworthy 
E-Utilities 

PubMed Services 
Journals Database 
MeSH Browser 
Single Citation Matcher 
Batch Citation Matcher 
Clinical Queries 
LinkOut 
Cubby 

Related Resources 
Order Documents 
NLM Gateway 
TOXNET 
Consumer Health 
Clinical Alerts 
ClinicalTrials.gov 
PubMed Central 

Privacy Policy 



for ["se quence-specific DNA bindin g protein" 
0 Limits Preview/Index History 



Limits: Publication Date to 1 999/08/26 
HllBSlj Summary (g) | show: 



Go 



Clear 



Clipboard 



Details 



20 



Sort 



Items 1-20 of 289 



jjpijText Qj 
1 of 15 Next 



1 



3 lt KimSK, WangJC Related Artjcles Unks 

Gene silencing via protein-mediated subcellular localization of DNA 
Proc Natl Acad Sci USA. 1999 Jul 20;96(15):8557-61. 
PMID: 1041 1914 [PubMed - indexed for MEDLINE] 

]2: LumPL, Schildbach JF. Related Articles, Links 

Specific DNA recognition by F Factor TraY involves beta-sheet residues 
J Biol Chem. 1999 Jul 9;274(28): 19644-8. 
PMID: 10391902 [PubMed - indexed for MEDLINE] 



Related Articles, Links 



pi) ^.Rad kov SA, Touitou R. Brehm A. Rowe M. West M. Kouzaride s T 
AlldavMJ. ' 

Epstein-Barr virus nuclear antigen 3C interacts with histone deacetylase to 

repress transcription. 

J Virol. 1999 Jul;73(7):5688-97. 

PMID: 10364319 [PubMed - indexed for MEDLINE] 

H 4: Abu-Elneel K, Kapeller L Shlomai T Related Articles, Links 

Universal minicircle sequence-binding protein, a sequence-specific DNA- 
binding protein that recognizes the two replication origins of the kinetoplast 
DNA minicircle. 

J Biol Chem. 1999 May 7;274(19): 13419-26. 
PMID: 10224106 [PubMed -. indexed for MEDLINE] 

B5:Ehrlich KC, Montalbano BG Gary TW Related Articles, Links 

Binding of the C6-zinc cluster protein, AFLR, to the promoters of aflatoxin 
pathway biosynthesis genes in Aspergillus parasiticus. 
Gene. 1999 Apr 16;230(2):249-57. 
PMID: 10216264 [PubMed - indexed for MEDLINE] 

0 6: Bartsch O, Horstmann S, Toprak K. Klemp nauer KH . Ferrari S Related Articles, Links 

Identification of cyclin A/Cdk2 phosphorylation sites in B-Myb 
Eur J Biochem. 1999 Mar;260(2):384-91 . 
PMID: 10095772 [PubMed - indexed for MEDLINE] 

0 7: VonOhlenT, Hooper JR Related Aft|c|eS( Ljnks 

The ciD mutation encodes a chimeric protein whose activity is regulated by 

Wingless signaling. 

DevBiol. 1999 Apr 1;208(1): 147-56. 



http://ww.ncbi.nlm.nih.gov/entre^ 



1/10/2003 



Entrez-PubMed 



Page 2 ot 3 



PMID: 10075848 [PubMed - indexed for MEDLINE] 



fl 8; Tolnav M. Vereshchagina LA. Tsokos GC 



Related Articles, Links 



Heterogeneous nuclear ribonucleoprotein DOB is a sequence-specific DNA- 
binding protein. 

Biochem J. 1999 Mar 1;338 ( Pt 2):417-25. 

PMID: 1 00245 1 8 [PubMed - indexed for MEDLINE] 

0 Q* Walton M, Saura h Young D. MacGibbon G. Hansen W. Lawlor P. Related Articles. Links 
Sirimanne E. Gluckman P. Dragunow M. 

CCAAT-enhancer binding protein alpha is expressed in activated microglial 
cells after brain injury. 

Brain Res Mol Brain Res. 1998 Oct 30;61(1-2):1 1-22. 
PMID: 9795 1 05 [PubMed - indexed for MEDLINE] 

H 1 fl:Banecki B. Kaguni JM. Marszalek J. Related Articles, Links 

Role of adenine nucleotides, molecular chaperones and chaperonins in 
stabilization of DnaA initiator protein of Escherichia coli. 
Biochim Biophys Acta. 1998 Oct 23;1442(l):39-48. 
PMID: 9767098 [PubMed - indexed for MEDLINE] 



Host proteins can stimulate Tn7 transposition: a novel role for the ribosomal 

protein L29 and the acyl carrier protein. 

EMBO J. 1998 Oct 1;17(19):5822-31. 

PMID: 9755182 [PubMed - indexed for MEDLINE] 



The herpes simplex virus type 1 origin binding protein. Specific recognition 
of phosphates and methyl groups defines the interacting surface for a 
monomeric DNA binding domain in the major groove of DNA. 
JBiol Chem. 1998 Sep 18;273(38):24633-9. 
PMID: 9733759 [PubMed - indexed for MEDLINE] 

H 1 3;Kortschak RD. Reimann H. Zimmer M. Evre HJ. Saint H Jenne DE. Related Articles, Links 
The human dead ringer/bright homolog, DRIL1: cDNA cloning, gene 
structure, and mapping to D19S886, a marker on 19pl3.3 that is strictly 
linked to the Peutz-Jeghers syndrome. 
Genomics. 1998 Jul 15;51(2):288-92. 
PMID: 9722953 [PubMed - indexed for MEDLINE] 

H 1 4: Raina R, Schlappi M. Karunanandaa B. Elhofy A. Fedoroff N. Related Articles, Links 
Concerted formation of macromolecular Suppressor-mutator transposition 
complexes. 

Proc Natl Acad Sci USA. 1998 Jul 21;95(15):8526-31. 
PMID: 967171 1 [PubMed - indexed for MEDLINE] 

@ 1 Sr Brown JL. Mucci D. Whitelev M. Dirksen ML. Kassis JA. Related Articles, Links 

The Drosophila Polycomb group gene pleiohomeotic encodes a DNA 
binding protein with homology to the transcription factor YY1 . 
Mol Cell. 1998 Jun; 1(7): 1057-64. 
PMID: 9651589 [PubMed - indexed for MEDLINE] 




Related Articles, Links 




Related Articles, Links 



http://ww.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&^ 



1/10/2003 



Entrez-PubMed 



Fage3of3 



□ 1 6: Ariumi Y. Shimotohno K. Nod a M\ HatanaVa M Related Articles, Links 

Characterization of the internal promoter of human T-cell leukemia virus 
type I. 

FEBS Lett. 1998 Feb 13;423(l):25-30. 

PMED: 9506835 [PubMed - indexed for MEDLINE] 

E3 17; Boehmer PE. Related Articles, Links 

The herpes simplex virus type-1 single-strand DNA-binding protein, ICP8, 
increases the processivity of the UL9 protein DNA helicase. 
J Biol Chem. 1998 Jan 30;273(5):2676-83. 
PMID: 9446572 [PubMed - indexed for MEDLINE] 



P1 1 8; Sutton MP. Kag uni JM. 

The Escherichia coli dnaA gene: four functional domains. 

JMolBioL 1997 Dec 12;274(4):546-6L 

PMID: 9417934 [PubMed - indexed for MEDLINE] 



Related Articles, Links 



1 1 9; Kitada T. Seki S. Nakatani K, Kawada N. Kuroki T. Monna T. Related Articles, Links 

Hepatic expression of c-Myb in chronic human liver disease. 

Hepatology. 1997 Dec;26(6): 1506-12. 

PMID: 9397991 [PubMed - indexed for MEDLINE] 



; Garigiio M. Ying GG. Hertel L. Gaboli M. Clerc RG. Landolfo S. Related Articles, Links 

The high-mobility group protein T160 binds to both linear and cruciform 

DNA and mediates DNA bending as determined by ring closure. 

Exp Cell Res. 1997 Nov 1;236(2):472-81. 

PMID: 9367632 [PubMed - indexed for MEDLINE] 



KSMiil Summary 



Show: 



20 



Sort 



ISerid tog) Text 



Items 1-20 of 289 



of 15 Next 



Write to the Help Desk 
NCB1 1 NLM 1 NIH 
Department of Health & Human Services 
Freedom of Information Act | Disclaimer 

i686-pc-linux-gnu Jan 7 2003 16:40:32 



http://www.ncbi.nlm.nih.gov^ 



1/10/2003 



Entrez-PubMed 



% NC 



PubMed 



Page lot 3 



EXHIBIT B 



Nucleotide 



Protein 



Genome 



Structure 



PMC 



National 
library 
of Medicine 

Taxonomy 



mm 

NLM 



OMIM 



Be 



Search PubMed 



About Entrez 



o 



Text Version 

Entrez PubMed 
Overview 
Help | FAQ 
Tutorial 

New/Noteworthy 
E-Utiiities 

PubMed Services 
Journals Database 
MeSH Browser 
Single Citation Matcher 
Batch Citation Matcher 
Clinical Queries 
LinkOut 
Cubby 

Related Resources 
Order Documents 
NLM Gateway 
TOXNET 
Consumer Health 
Clinical Alerts 
ClinicalTrials.gov 
PubMed Central 

Privacy Policy 



|5 for "recognition sequence" and DNA and "binding pj ^^ [^etear 
EI Limits Preview/Index History Clipboard 



Details 



Limits: Publication Da te to 1999/08/26 

pBisplayj| Summary 



Show: 



20 



Sort aif#f^^l Text □ 



Items 1-20 of 54 



h 



of 3 Next 



H 1 -Gaszner M Vazquez J. Schedl P. Related Articles, Links 

The Zw5 protein, a component of the scs chromatin domain boundary, is 
able to block enhancer-promoter interaction. 
Genes Dev. 1999 Aug 15;13(16);2098-107. 
PMID: 10465787 [PubMed - indexed for MEDLINE] 

jg ? r ; Dhavan GM. Lapham J. Yang S. Crothers DM. Related Articles, Links 

Decreased imino proton exchange and base-pair opening in the IHF-DNA 
complex measured by NMR. 
JMolBiol. 1999 May 14;288(4):659-71. 
PMID: 10329171 [PubMed - indexed for MEDLINE] 

£j 3 « Baillie RA. Sha X. Thuillier P. Clarke SD. Related Articles, Links 

A novel 3T3-L1 preadipocyte variant that expresses PPARgamma2 and 
RXRalpha but does not undergo differentiation. 
J Lipid Res. 1998 Oct;39(10):2048-53. 
PMID: 978825 1 [PubMed - indexed for MEDLINE] 

H 4; Simonsson S. Samuelsson T. Elias P. Related Articles, Links 

The herpes simplex virus type 1 origin binding protein. Specific recognition 
of phosphates and methyl groups defines the interacting surface for a 
monomeric DNA binding domain in the major groove of DNA. 
J Biol Chem. 1998 Sep 18;273(38):24633-9. 
PMID: 9733759 [PubMed - indexed for MEDLINE] 

E 5; Abidi FE. Roh H. Keath EJ. * Related Articles, Links 

Identification and characterization of a phase-specific, nuclear DNA binding 
protein from the dimorphic pathogenic fungus Histoplasma capsulatum. 
Infect Immun. 1998 Aug;66(8):3867-73. 
PMID: 9673274 [PubMed - indexed for MEDLINE] 



gg 6; Cox GS, Gutkin DW. Haas MJ. Cosgrove DE. Related Articles, Links 

Isolation of an Alu repetitive DNA binding protein and effect of CpG 
methylation on binding to its recognition sequence. 
Biochim Biophys Acta. 1998 Mar 4;1396(l):67-87. 
PMID: 9524225 [PubMed - indexed for MEDLINE] 

E 7? Sun W. Hattman S. Kool E. Related Articles, Links 

Interaction of the bacteriophage Mu transcriptional activator protein, C, with 



http://\wvw.ncbi.nlm^ 



1/10/2003 



j^iiuc^-i uuivicu jf age 2 01 5 



its target site in the mom promoter. 

J Mol Biol. 1997 Nov 7;273(4):765-74. 

PMID: 9367769 [PubMed - indexed for MEDLINE] 



p| ft; Nambiar A. Swamvnathan SK. Kandala JC. Guntaka RV. Related Articles, Links 

Characterization of the DNA-binding domain of the avian Y-box protein, 
chkYB-2, and mutational analysis of its single-strand binding motif in the 
Rous sarcoma virus enhancer. 
J Virol. 1998 Feb;72(2):900-9. 
PMID: 9444981 [PubMed - indexed for MEDLINE] 



9; Coupe SA. Deikman J. Related Articles, Links 

Characterization of a DNA-binding protein that interacts with 5' flanking 
regions of two fruit-ripening genes. 
Plant J. 1997 Jun;ll(6):1207-18. 
PMID: 9225464 [PubMed - indexed for MEDLINE] 



@ 1 ft; Keren-Tal I. Dantes A. Plehn-Dujowich D. Amsterdam A. Related Articles, Links 

Association of Ad4BP/SF-l transcription factor with steroidogenic activity 
in oncogene-transformed granulosa cells. 
Mol Cell Endocrinol. 1997 Mar 14;127(l):49-57. 
PMID: 9099900 [PubMed - indexed for MEDLINE] 

Oil ; Sawada Y. Noda M. Related Articles, Links 

An adipogenic basic helix-loop-helix-leucine zipper type transcription factor 
(ADD1) mRNA is expressed and regulated by retinoic acid in osteoblastic 
cells. 

Mol Endocrinol. 1996 Oct;10(10):1238-48. 
PMID: 9121491 [PubMed - indexed for MEDLINE] 



H 1 2; Martin SK Spaller MR, Hergenrother PJ. Related Articles, Links 

Expression and site-directed mutagenesis of the phosphatidylcholine- 
preferring phospholipase C of Bacillus cereus: probing the role of the active 
site Glul46. 

Biochemistry. 1996 Oct l;35(39):12970-7. 

PMID: 8841 144 [PubMed - indexed for MEDLINE] 



0 1 3; Blake M. Niklinski J. Zaiac-Kave M. Related Articles, Links 

Interactions of the transcription factors MIBP1 and RFX1 with the EP 
element of the hepatitis B virus enhancer. 
J Virol. 1996 Sep;70(9):6060-6. 
PMID: 8709229 [PubMed - indexed for MEDLINE] 



) 1 4; Thomas M. Massimi P. Banks L. Related Articles, Links 

HPV-18 E6 inhibits p53 DNA binding activity regardless of the oligomeric 
state of p53 or the exact p53 recognition sequence. 
Oncogene. 1996 Aug l;13(3):471-80. 
PMID: 8760288 [PubMed - indexed for MEDLINE] 

) 1 5! Liu PC. Phillips MA, Matsumura F. Related Articles, Links 

Alteration by 2,3,7,8-Tetrachlorodibenzo-p-dioxin of CCAAT/enhancer 
binding protein correlates with suppression of adipocyte differentiation in 



http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?SUBMIT=^ 



1/10/2003 



Entrez-PubMed Page 3 of 3 



3T3-L1 cells. 

Mol Pharmacol. 1996 Jun;49(6): 989-97. 

PMID: 8649359 [PubMed - indexed for MEDLINE] 



E3 1 Chang L. Thompson MA. Related Articles, Links 

Activity of the distal positive element of the peripherin gene is dependent on 
proteins binding to an Ets-like recognition site and a novel inverted repeat 
site. 

J Biol Chem. 1996 Max 15;271(ll):6467-75. 
PMID: 8626448 [PubMed - indexed for MEDLINE] 



El 17; ChenY.GillGN. Related Articles, Links 

A heterodimeric nuclear protein complex binds two palindromic sequences 
in the proximal enhancer of the human erbB-2 gene. 
J Biol Chem. 1996 Mar l;271(9):5183-8. 
PMID: 8617800 [PubMed - indexed for MEDLINE] 

E3 1 S: Walker GT. Linn CP. Nadeau JG. Related Articles, Links 

DNA detection by strand displacement amplification and fluorescence 
polarization with signal enhancement using a DNA binding protein. 
Nucleic Acids Res. 1996 Jan 15;24(2):348-53. 
PMID: 8628661 [PubMed - indexed for MEDLINE] 

g] ig ; Lian JB, Stein GS. Stein JL. Van Wijnen A, McCabe L. Banerjee C. Related Articles, Links 
Hoffmann H. 

The osteocalcin gene promoter provides a molecular blueprint for regulatory 

mechanisms controlling bone tissue formation: role of transcription factors 

involved in development. 

Connect Tissue Res. 1 996;35( 1-4): 15-21. 

PMID: 9084639 [PubMed - indexed for MEDLINE] 



H 20: Samadani"U, Oian X, Costa RH. Related Articles, Links 

Identification of a transthyretin enhancer site that selectively binds the 
hepatocyte nuclear factor-3 beta isoform. 
GeneExpr. 1996;6(l):23-33. 
PMID: 8931989 [PubMed - indexed for MEDLINE] 



|H|ispig| Summary 0] Show: [20 f||Sort B liiPl^ 



Items 1-20 of 54 



Text 



of 3 Next 



Write to the Help Desk 
NCBI 1 NLM 1 NIH 
Department of Health & Human Services 
* Freedom of Information Act | Disclaimer 



i686-pc-Hnux-gnu Jan 7 2003 16:40:32 



http://vmw.ncbi.nlm.nih.gov/entrez/query.fcgi?SUBMIT==y 



1/10/2003 



EXHIBIT C 



Proc. Natl. Acad. Set. USA 

Vol. 91. pp. 12357-12361, December 1994 

Biophysics 



DNA recognition code of transcription factors in the 
helix-turn-helix, probe helix, hormone receptor, 
and zinc finger families 

(DNA-protein interaction/homeodomain/Ieucine zipper/transcription factor GATA) 

Masashi Suzuki* and Naoto YAGit 

•Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, United Kingdom; and tTohoku University, School of 
Medicine, Seiryo-machi, Sendai, 980-77, Japan 



Communicated by Tadaimtsu Kishimoto, August 8, 1994 

ABSTRACT We have previously reported that in four 
transcription factor families the DNA-recognition rules can be 
described as (0 chemical rules, which list possible pairings 
between the 20 amino acid residues and the four DNA bases, 
and («) stereochemical rules, which describe the base and 
amino acid positions in contact. We have incorporated these 
rules Into a computer program and examined the nature of the 
rules. Here we conclude that the DNA recognition rules are 
simple, logical, and consistent. The rules are specific enough to 
predict DNA-binding characteristics from a protein sequence. 



A large number of transcription factors, which play dominant 
roles in transcription regulation by binding to different DNA 
sequences, have been identified. Since the three-dimensional 
structure of a protein is uniquely fixed by its amino acid 
sequence, basic rules are expected, which would predict the 
DNA-binding specificity from a transcription factor sequence. 
But, since the initial expectation of such rules (the recognition 
code) (1), many structural biologists have expressed skepti- 
cism about their existence (for example, see ref. 2). 

The crystal structures of a number of transcription factor- 
DNA complexes have been determined (3-27); also a consid- 
erable amount of biochemical, genetic, and statistical infor- 
mation about the binding specificity of transcription factors is 
available (28-34). By using these data, we have devised a 
method of analyzing the patterns of contacts between DNA 
bases and amino acid residues (35-40) and have described the 
DNA-recognition rules of four transcription factor families: 
the probe helix (PH), which includes homeo and zipper 
proteins (35, 36); the helix-turn-helix (HTH) (M.S. and M. 
Gerstein, unpublished results); the zinc finger (ZnF) (37, 38); 
and the C4 Zn-binding proteins (C4), which include hormone 
receptors and GATA proteins (38-40). These rules concern 
contacts from amino acid side chains in a recognition helix to 
DNA bases in the major groove. 

The aim of this paper is to establish a framework of 
DNA-recognition rules common to the four families and to 
examine whether, from the nature of the rules, they consti- 
tute a recognition code. 

Framework of the DNA Recognition Rules 

The DNA-recognition rules are of two types, chemical and 
stereochemical. The chemical rules list possible pairing part- 
ners of amino acid side chains and DNA bases through 
hydrogen bonding or hydrophobic interaction (Fig. la; ref. 
36). The sizes of residues are also important; from a fixed 
position on an interaction surface, a longer side chain can 
reach a more distant part of the DNA. The residues are 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. 51734 solely to indicate this fact. 



classified roughly into four groups — small, medium, large, 
and aromatic (Fig. la; ref. 36). These chemical rules are 
general for any binding motif. 

The inclination of the recognition helix in the major groove 
of DNA is fixed by the structural elements specific to a 
DNA-binding motif. For instance, a recognition helix of PH 
has conserved Arg/Lys positions, which bind to DNA phos- 
phates and thereby fix the binding geometry (35, 36). As a 
consequence, each binding motif uses a set of particular 
amino acid positions for base recognition. These can be easily 
summarized into a chart with specifications of the sizes of 
residues used; each DNA-binding motif has its own specific 
stereochemical chart (Fig. 1 b-e). ZnF motifs can be subdi- 
vided into two groups (37), but here only the larger group is 
discussed (A fingers). 

Binding Score 

We have incorporated the rules into a computer program, 
which is written in the C programming language and imple- 
mented under the Unix operating system. Its core function is 
to score the match between the given DNA and protein 
sequences. This binding score is essentially the number of 
contacts predicted between the two sequences and reflects 
the binding energy. 

To calculate the binding score, points for stereochemical 
(see the legend to Fig. 1 b-e) and chemical (Fig. la) merits are 
introduced. The binding score is calculated as the sum over 
all the contacts of (stereochemical merit point) x (chemical 
merit point) for each interaction. The chemical merit points 
given to different base-residue partners are not always the 
same (Fig. la). For instance, Arg and Lys could bind by a 
hydrogen bond to T, G, or A. But in fact they recognize the 
G base almost exclusively (36), because the G base in a G-C 
pair is electrically polar (negatively charged), while Arg and 
Lys have a positive charge. Therefore, binding of Arg or Lys 
to G should be given more points than to T or A. Similarly, 
not all the contacts in the stereochemical charts appear to be 
equally important (refs. 36 and 37; M.S. and M. Gerstein, 
unpublished results), and this is reflected in differences in the 
two grades of stereochemical merit points (see contacts 
marked with diamonds and those not in Fig, 1 b-e). 

Often several different sets of contacts are possible for 
given protein and DNA sequences. In this case, the pairing 
with the highest score is chosen. However, it is stereo- 
chemically forbidden to make two contacts that cross over 
each other in the chart. For instance, in Fig. lc aa 5 can 
contact C3, and aa 8 can contact C2, but not simultaneously. 
As an example, the binding score of CAP (Fig. 2h) is the sum 
of the products of the chemical and stereochemical points for 



Abbreviations: PH, probe helix; HTH, helix-tum-helix; ZnF, zinc 
finger; C4, C4 Zn-binding protein. 



12357 



12358 Biophysics: Suzuki and Yagi 



chemical 





small 


medium 


large 


aromatic 


A 


CysSefi 
Thr 10 


Asn 15 
Asp 9 

His 8 


Gin 15 
Glu g 

W l 


Tyr 5 
Trp 5 


T 


Ala 10 


VaUl* 12 

Asn to 
Ho s 


UtUlet12 

Gin 10 
ArgXvs 5 


TytPh* 12 
Trp 12 


G 


CysJSes 
Thr 10 


His 12 

Asn 10 


ArftLy, 15 

Gin 10 


Tyr 5 


C 


Val S 

Cys.Se i 
Tfif 10, 


Asp 12 

n i 


Glu 12 
Gin 10 
LbixMbI a 


TyfPha 8 
Trp 6 



recognition code- 



C4 


W4 
u9 
slxa:l 


W3 
aa5 

alze:nv 


W2 
aal 

elre:m,l 


A 


GlrwGlu 


AsAAsp, 
GliwGlu 


Asn.Asft 
Gln,Glu 


T 


Leu, Mat 


Vat.lte, 
Uu,Uat 


Val, Ho, 
Leu.Met 


G 


Arg.Lys 


Hls^ra, 


HU.Arg, 


C 


Gtu 

Leu, Met 


AspjGlu 
LeoMot. 
Me 


Asp^Gtu 

Leu .Mot. 
lie 




ZnF 


W4 


W3 


W2 


A 


GlrvGlu 


AanAsp 

(Gln.Glu ) 


GlrvGlu 


T 


Lau^tat 


Vat, Ma 

(Leu .Met) 


Leu.Met 


G 


Arg,Ly« 


His 

(Arg.Lys) 


Arg.Lys 


C 


Glu 

Leu,Mei 


Asp (Glu) 
Leu. Met. 
lie 


Glu 

Leu, Met. 



stereochemical 

D HTH 

3 (W1 |W2[W3|W4l i 



4^g' Ik . 



N faal {aa2taa5taa6 jc 



5 |C1 |C2|C3|C4] 3 



a |W1 |W2 [W3 | s 



N |aa1 Iaa4|aa5|aa8| c 



s |C1 |C2 |C3 | a 



™& rrg<j> 1$ 

N |aal[aa4|aa5|aa9| c 
S |C1 |C2 |C3|C4[ 3 



EHEHE3E3 



N|aa-l|aa2|aa3|aa6|c 



s |C1 [02 |C3|C4[ 3 



Fig. 1 . Chemical (a) and stereochemical (b-e) rules that make the 
DNA-recognition code and code tables for C4 (J) and ZnF is), (a) The 
chemical merit p o ints art aho shown. Residues in boldfaced letters are 
those important for specificity (specific residues), {b-e) Sketches of 
the DNA major groove with the bases, W1-W4 (top strand) and C1-C4 
(bottom strand), to which a recognition helix (in the middle line) binds. 
Hie sizes of residues, small (s), medium (m), and large (I), used for the 
contacts are also shown. Aromatic residues may often be included 
with the large group. Ten stereochemical merit points are given to the 
contacts marked with diamonds and five to the other contacts. No 
stereochemical points are given otherwise. If a hydrophobic interac- 
tion takes place to a T base and if one of the two neighboring bases is 
another T, an additional 3 points is added to the chemical merit point, 
since this is likely to enhance the hydrophobic environment. The 
binding specificity of Asn (aa 1) of PH is affected by Asn (aa 2) through 
side chain-side chain interactions (36); if Asn occupies position 2, Asn 
(aa 1) interacts with Asn (aa 2) and binds to A (W2), but if not Asn (aa 
2) bridges the CI and W2 bases. For this reason, if position 2 is 
occupied by Asn, the chemical merit point of Asn (aa 1) to A (W2) is 
kept at 15; if not, it is decreased to 10 and the residue is allowed to bind 
to the CI base at the same time. When a single residue binds to two 
bases simultaneously, the two contacts are handled independently. 
This is to simplify the computer program, although the two bases 
bridged in this way are limited and can be handled as a set (36). The 
code tables if and g) are made by choosing the columns from a 
according to the residue sizes specified in </and e. The interaction of 
hydrophobic residues to the C base is weaker and therefore is shown 
by plain instead of boldfaced letters. Position 3 in ZnF can be occupied 
by a medium or large residue, but a medium residue is preferable (37); 
the large residues are shown in the parentheses. 

the Arg-G, Arg*G, aad -GluC^contacts, respectively— (10 x 
15) + (5 x 15) + (10 x 12) = 345. 

Consistency and Specificity of the Rules 

The DNA recognition rules were originally deduced from 25 
crystal structures (3-27) and many other transcription factors 



Proc. Natl Acad, ScL USA 91 (1994) 

whose binding specificity has been characterized by genetic 
or biochemical experiments (see the references cited in refs 
35-40). 

Contacts were predicted by the program for 73 recognition 
helices: those of 10 PH proteins, 20 HTH proteins, 38 ZnF 
proteins (specific or very specific A fingers listed in ref. 37), 
and 5 C4 proteins (selected examples are shown in Fig. 2). 

In most examples, the predicted contacts are essentially 
the same as those observed or predicted in earlier work. Thus 
the rules can consistently explain the amino acid-base con- 
tacts. However, this does not necessarily suggest that the 
rules can explain how factors discriminate between the target 
and other DNA sequences; if many other DNA sequences 
were recognized by a factor in similar ways, the factor could 
not choose the correct site. We now examine this aspect 
(specificity) of the rules in two ways. 

We first compare the binding score given to the real binding 
site with those for sites consisting of all other possible base 
combinations (Fig. 3). HTH, C4, arid ZnF recognize four 
base pairs, which have 256 possible combinations. PH rec- 
ognizes three base pairs, and the number of combinations is 
64. In our calculation, the real binding sequence is usually 
found among a small number of DNA sequences that score 
the highest (Fig. 3); the rules are sufficiently specific to 
exclude the rest of the DNA sequences, which score less. To 
evaluate the specificity of the rules, we introduce the spec- 
ificity index, which is defined as (100 - n - f )%, where n is 
the percentage of the DNA sequences that score higher than 
the real binding sequence and m is that of the DNA sequences 
that score the same as the real binding sequence. If a factor 
has two natural binding sequences— sequence /, which scores 
higher than sequence/— n is defined as the percentage that 
scores higher than /, and m is defined as the percentage that 
scores between / and/. The average indices calculated for 
the factors are 93% (PH) (96% if Max is excluded, which is 
further discussed in M.S. and M. Gerstein, unpublished 
results), 99% (C4), 96% (ZnF), and 92% (HTH). 

As a second test we now examine the DNA sequence of a 
region regulated by a transcription factor in vivo. When the 
binding score is calculated for every four base pairs along the 
DNA, shifting one base pair at a time, the highest score is 
given for the experimentally identified binding site (Fig. 4). 
Since DNA has two strands, the score must be calculated 
along each of the two strands. 

The above two tests have shown that the rules are highly 
specific. In the crystal structures, some additional contacts 
are seen from outside a recognition helix, but the binding 
specificity of a recognition helix seems to be essentially 
sufficient to specify uniquely the DNA-binding sites. 

Spacing Type 

An a-helix can bind to no more than five base pairs because 
of the curvature of the DNA major groove; it can access only 
one side of the DNA (44). To recognize more than five base 
pairs, two or more helices are used in combination, essen- 
tially by either relating the two with a twofold symmetry axis 
or repeating them in tandem. The classic HTH proteins and 
zipper proteins of the PH family use "symmetrical" arrange- 
ments (denoted here as S), while ZnF proteins use a "tan- 
dem" arrangement (denoted here as T). C4 proteins use both 
types of arrangements (45). 

Symmetrical arrangements can be characterized by whether 
the C teirninus (denoted with the "+ " sign) or the N terminus 
(denoted with the sign) is closer to the dyad axis and the 
number of bases along the DNA between the two binding sites 
(for example, S +6 for the HTH protein CAP). By knowing the 
spacing type, the plot of the binding score can be improved. 
When the binding scores of the two DNA strands for CAP 
binding are shifted by six base pairs and added to each other, the 



Biophysics: Suzuki and Yagi 



Proc. Natl. Acad. Sri. USA 91 (1994) 12359 



3GCN4 ph 



□□□□ 



9 CAP HTH 



iRlEjOlBi 



□ □□□ 



IAICITI 
/ / 



□□no 



y , 

sag 



IcUIcItI 



:z: 



IrIeIgIrI 



□□□□ 



CZHF3 ZnF 


d 


ItIgIcIgI 


iTlQlclol 


/ I | 




□□□□I 


■ □□□□I 






|a|c|g|c| 


|A|C|G|C| 


iGlucR C4 




|C|T|T|G| 


|C|T|T|0| 






IgIkIvIrI 


■□□□□I 






|Q|A|A|C| 


|6|A|A|C| 







eMaxPH 
l<Mat PH 



□□□□ 



iHltlelwl 

90/ 

>eT / 



□EOEJ 



Era 



Fig. 2. Comparison between contacts observed in the crystal structures (a, c, e, », and k) and computer-predicted contacts (6, </, /, /j, 
y, and /). The figures are drawn in the same way as in Fig. 1. The dotted line (• • * •) inj shows an additional predicted hydrophobic interaction 

to the neighboring T base. A pair of two dashed lines ( ) in/show two alternative contacts with the same score. The contacts that are predicted 

but not observed and those observed but not predicted are marked with circles. The side chain of Asn (aa 1) in Mata2 (k) is not well described 
in the original report of the crystal structure (4). The residue is predicted to contact the C (CI) and T (W2) bases (/). Leu (aa 4} of Max is predicted 
to make contacts with C (Cl) or C (C2) (/). The figures of the original report (5) show that this leucine does seem close to C (Cl) t but the 
coordinates have not been published and the paper does not mention this contact. 



new plot shows a clearer peak (Fig. 4e). Thus, a weaker binding 
specificity of a HTH recognition helix (see the previous section) 
is compensated by combining two such helices. 

The spacing type of the majority of ZnF proteins is T -1 
[i.e., two neighboring fingers share one base pair (-1) in a 
tandem (T) arrangement (37)]. A single finger appears to be 
incapable of discriminating between DNA sequences, but the 
combination of two or three fingers does seem to be sufficient 
(see figure 9 of ref. 37). This can explain why fingers are 
always found in a repeat. 

The two experimentally identified ADR1 (ZnF)-binding 
sites in its regulatory DNA region are predicted successfully 
(Fig. 4c). The two sites are likely to be recognized by a 
symmetrical dimer of ADR1 molecules, each of which has 
two ZnF motifs in tandem (T -1), with the superspacing type 
of S +6 (Fig. 4c). Therefore, the communication between 
DNA and proteins can be described with increasing accu- 
racy, from the chemical, the stereochemical, the spacing to 
the superspacing levels. 



Prediction and Design 

Our computer program successfully identifies the binding 
sites of transcription factors whose binding specificities 
have been characterized experimentally. Therefore, it may 
be natural to expect that it can (/) predict the yet unknown 
binding specificity of a protein sequence and (//) design a 
factor that would recognize a particular DNA sequence. 

In the ZnF and C4 families, a simple table relating DNA 
and protein sequences can be produced (Fig. 1 / and g\ ref. 
38). Three residues of C4 — 1, 5, and 9 — bind to the three 
consecutive bases W2-W4, by a simple one residue-one base 
relationship, while ZnF positions -1, 3, and 6 bind to 
W2-W4. Therefore, by choosing specific partner residues in 
the correct columns from Fig. la according to the amino acid 
sizes shown in Fig. 1 d and e, recognition tables for the three 
positions from two types can be constructed (see ref. 38 for 
further discussion). 



ti 



Zif F3 
(ZnF) 



Si: 



99% 



if»M M 



500 



i ! 

* * • • 

n u 

!: :: 



Est R 
(C4) 



S|:100% 

M 

500 





(HTH) 

i 

ill 


i it 
mil 


ill! I 

: nit: ; 


Ililii 


iiiiii! il 



500 



Mat 
(PH) 



e I 



I 

|| | Sl:99% 



500 



i ih 



TEF 
(PH) 



Si: 00% 



500 



Fio. 3. Distribution of the binding scores for Zif268 finger 3 (ZnF) (a), estrogen receptor (C4) (b), CAP (HTH) (c), Mata2 (PH) (</), and TEF 
(PH) (e). The scores given to the real binding sites (marked with arrowheads) are compared with those given to the rest of all the possible 
combinations of DNA bases. The abscissas show the binding score, while the ordinates show the number of DNA sequences with that score. The 
specificity index (SI) is also shown. Note that TEF has Asn (aa 1) and Asn (aa 2) but Mata2 has Asn only at position 1 (see legend to Fig. 1). 



12360 Biophysics: Suzuki and Yagi 



a 

EstR (C4) 



5 V 

(IMtllfll 


mm 




1/ U 3 


ttuihjri 
3 jh 






/ \ f\ 5 



400 
200 
0 

200 
400 " 



CAP (HTH) 




TGT TTAAATGTG AATCGAA1 




ACTTT^ACAAAT TTA£ACT TAGCTTAGTGT TAJ 





Proc. NatL Acad. Sci. USA 91 (1994) 



ADRKZnF) 



400 


h 


ouu 


200 




Ann 




5 3 

UlLmiClltciirtiefijtir.il itt t. ii. ., M1 intJiiii 




^ciiiciimiiuiiiinturinuictitrniHeunin 


0 


200 




400 


400 




800 



iiiiciiiiititiiTiuceciuitiiiitficiinuctdincG^ 
5 3 




TGAAAT TGTTTAA ATGHTCA ATCGAATCACA ATCGTT 

5 3 



1600 



800 



Fig. 4 Prediction of the binding sites for factors: estrogen receptor (C4) (a and d), CAP (HTH) {b and e), and ADR1 (ZnF) (c and f) (a-c) 
The binding score is calculated at every four base pairs by shifting one base pair along the DNA strand at a time. The DNA sequences were 
taken from refs. 41-43. The experimentally identified binding sites are marked with bars. The dotted lines show the cut-off levels, which separate 
real peaks from the background, (d-f) The binding scores to the two DNA strands are added to each other according to the spacing types Note 
that a new peak for a dimer turns up m the center of two monomer binding sites on different DNA strands. The spacing types of symmetrical 

S -4, HTH (LacR, GaIR) ; S +5 C4 (EstR, GIcR; although the three base pairs at the center of the binding sites are often described as the spacer 
because these sequences ; vary, here the lave base pairs that are not contacted by the recognition helices are defined as the spacer); S -6, HTH 

wT*j£ ^JSZ S { P JJT5u* S C6 (Gal4) - ^ spacing ^f^s of tandem arrangements identified are T -1, ZnF(A>-ZnF(A)- T 

Kissinger, C. R., Liu, B., Martin-Blanco, E., Kornberg, T. B. 
& Pabo, C. O. (1990) Cell 63, 579-590. 
Hegde, R. S., Grossman, S. R., Laimins, L. A. & Sigler, P. B. 
(1992) Nature (London) 359, 505-512. 
Jordan, S. R. & Pabo, C. O. (1988) Science 242, 893-899. 
Anderson, J. E., Ptashne, M. & Harrison, S. C. (1987) Nature 
(London) 326, 846-852. 

Aggarwal, A. K., Rodgers, D. W., Drottar, M., Ptashne, M. & 
Harrison, S. C. (1988) Science 242, 899-907. 
Wolberger, C, Dong, Y., Ptashne, M. & Harrison, S. C. (1988) 
Nature (London) 335, 789-795. 

Mondragon, A. & Harrison, S. C. (1991) /. Mol. Biol 219, 
321-334. 

Rodegers.D. W. & Harrison, S. C. (1993) Structure 1,227-240. 
Shultz, S. C, Shields, G. C. & Steitz, T. A. (1991) Science 253, 
1001-1007. 

Brennan, R. G., Roderick, S. L., Takeda, Y. & Matthews, 
B. W. (1990) Proc. NatL Acad. Sci. USA 87, 8165-8169. 
Feng, J.-A., Johnson, R.-C. & Dickerson, R. E. (1994) Science 
263, 348-355. 

Clark, M. L., Halay, E. D., Lai, E. & Barley, S. K. (1993) 
Nature (London) 364, 412-420. . 

Pavletich, N. P. & Pabo, C. 0. (1991) Science 252, 809-817. 
Fairall, L„ Schwabe, J., Chapman, L., Finch, J. T. & Rhodes, 
D. (1993) Nature (London) 366, 483-487. 
Pavletich, N. P. & Pabo, C. O. (1993) Science 261, 1701-1707. 
Luisi, B. F., Xu, X. W. f Otwinowski, Z., Freedman, L. P.. 
Yamamoto, K. R. & Sigler, P. B. (1991) Nature (London) 352, 
497-505. 

Seeman, N. C. , Rosenberg, J. M. & Rich, A. (1976) iVoc. Natl. 
Acad. Sci. USA 73, 804-808. 

Lehming, N., Sartorius, J., Kisters-Woike, B., von Wilcken- 
Bergmann, B. & Moller-Hill, B. (1991) in Nucleic Acids and 
Molecular Biology, eds. Eckstein, F. & Lilley, D, M. J. 
(Springer, Heidelberg), Vol. 5, pp. 114-125. 
Kisters-Woike, B., Lehming, N., Sartorius, J., von Wilcken- 



The rules will be further improved as information becomes 
available. For example, in this study, changes in the DNA 
structure upon binding proteins and the sequence-dependent 
differences in the DNA structures are ignored. However, the 
framework and the major features of the rules are unlikely to 
change. We have shown that the DNA-recognition rules for 
well-characterized factors in the four families are simple, 
logical, consistent, and specific. We therefore believe that 
these rules constitute the DNA-recognition code. 

We thank Drs. C. Chothia, J. Finch, and A. Klug and Mr. S. E. 
Brenner for their critical reading of the paper, 

1. Pabo, C. O. & Sauer, R. T. (1984) Annu. Rev. Biochem. 53, 
293-321. 

2. Matthews, B. W. (1988) Nature (London) 335, 294-295. 

3. Pabo, C. O., Aggarwal, A. K., Jordan, S. R., Beamer, L. J., 
Obeysekare, U. R. & Harrison, S. C. (1990) Science 247, 
1210-1213. 

4. Wolberger, C, Vershon, A. K., Liu, B„ Johnson, A. D. & 
Pabo, C. O. (1991) Cell 67, 517-528. 

5. Ferrt-D'Amare\ A. R,, Prendergast, G. C. f Ziff, E. B. & Bur- 
ley, S. K. (1993) Nature (London) 363, 38-45. 

6. Ellenberger, T. E., Brandl, C. S. ( Struhl, K. & Harrison, S. C 

(1992) Cell 71, 1223-1237. 

7. Konig, P. & Richmond, T. (1993) /. Mot. Biol. 233, 139-154. 

8. Ferrd-D'Amare\ A. R., Pognonec, P. , Roeder, R. G. & Burley, 
S. K. (1994) EMBO J. 13, 180-189. 

9. Clarke, N. D., Beamer, L. J., Goldberg, H. R., Berkower, C. 
& Pabo, C. O. (1991) Science 254, 267-270. 
Schwabe, J. W., Chapman, L., Finch, J. T. & Rhodes, D. 

(1993) GT//75, 567-578. 

Omichinski, J. G., Clore, G. M., Schaad, O., Felsenfeld, G., 
Trainor, C, Appella, E., Stah, S. J. & Gronenborn, A. M. 
(1993) Science 261, 438-446. 



10. 



11. 



12. 

13. 

14. 
15. 

16. 

17. 

18. 

19. 
20. 

21. 

22. 

23. 

24. 
25. 

26. 
27. 



28. 



29. 



30. 



Biophysics: Suzuki and Yagi Proc. NatL Acad. Sci. USA 91 (1994) 12361 



Bergmann, B. & MOIler-Hill, B. (1991) Eur. J. Biockem. 198, 
411-419. 

31. Desjartais, J. R. & Berg, J. M. (1993) Proc. NatL Acad. Sci. 
USA 90, 2256-2260. 

32. Klevit. R. E. (1991) Science 253, 1367-1393. 

33. Suckow, M., von Wilcken- Bergmann, B. & Mflller-HilJ, B. 
(1993) EMBOJ. 12, 1193-1200. 

34. Trcissman, J., Harris, E., WUson, D. & Desplan, C. (1992) 
BioEssays 14, 145-150. 

35. Suzuki, M. (1993) EMBO J. 12, 3221-3226. 

36. Suzuki, M. (1994) Structure 2, 317-326. 

37. Suzuki, M„ Gerstein, M. & Yagi, N. (1994) Nucleic Acids Res. 
22, 3397-3405. 

38. Suzuki, M. (1994) Proc. Jpn. Acad. B70, 96-99. 



39. Suzuki, M. & Chothia, L. (1994) Proc. Jpn. Acad. B70, 58- 
61. 

40. Suzuki, M. & Yagi, N. (1994) Proc. Jpn. Acad. B70, 62-66. 

41. Deeley, M. & Yanofsky. C. (1992) /. Bacteriol. 151, 942-951. 

42. Scilcr-Tuyns, A., Walker, P., Martinez, E„ Menllat, A.-M., 
Givel, F. & Wahli, W. (1986) Nucieic Acids Res. 14, 8755- 
8770. 

43. Thukral, S. K., Eisen, A. & Young, E. T. (1991) Mol. Cell. 
Biol. 11, 1566-1577. 

44. Suzuki, M., Neuhaus, D„ Gerstein, M. & Aimoto, S. (1994) 
Protein Eng. 7, 461-470. 

45. Umesono, K., Murakami, K. K., Thompson, C. C. & Evans, 
R. M. (1991) Cell 65, 1255-1267. 



Overview 



EXHIBIT D 



Molecular Cloning of Sequence-Specific DNA 
Binding Proteins Using Recognition Site Probes 



Harinder Singh 1 * 2 , Roger G. 
Gere 1 and Jonathan H. 
LeBowitz 1 

'Massachusetts Institute of 
Technology 
2 University of Chicago 



ABSTRACT 

Genes encoding sequence-specific DNA 
binding proteins can be isolated by 
screening \gtll expression libraries with 
recognition site DNAs. This strategy is 
derived from that developed for the isola- 
tion of genes using antibody probes. Many 
different genes encoding transcriptional 
regulatory proteins have been cloned 
using this strategy. The DNA binding 
domains of these regulatory proteins con- 
tain different structural motifs including 
the helix-twm-helix. the "zinc finger" and 
the "leucine zipper". Various aspects of 
the screening strategy are evaluated and a 
detailed protocol is provided. In addition 
to binding site DNAs, protein and 
nucleotide probes have been successfully 
used to screen expression libraries. There- 
fore ligand based expression screening 
may be quite general in scope. 

252 Bitflbchniques 



INTRODUCTION 

Sequence-specific DNA binding 
proteins play a central role in decipher- 
ing the structural and regulatory infor- 
mation encoded in cellular and viral 
genomes. They function to initiate as 
well as control the transcription, 
replication and site-specific recombin- 
ation of DNA sequences. Biochemi- 
cally, these proteins determine the 
specificiry and reactivity of enzymatic 
assemblies that act on DNA, _ 

In genetically tractable prokaryotes 
and eukaryotes, most sequence-spe- 
cific DNA binding proteins have been 
identified as the products of trans-ac- 
ting regulatory loci. In many complex 
eukaryotic organisms a similar ap- 
proach to their identification has not 
been possible. Instead, the recent ap- 
plication of sensitive DNA binding as- 
says, in particular, DNase I footprint- 
ing (13) and gel electrophoresis of 
protein-DNA complexes (12,14), has 
led to the detection and characteriza- 
tion of numerous sequence-specific 
DNA binding proteins. A majority of 
these proteins bind selectively to dis- 
tinct transcriptional control elements 
and are thereby implicated in regulat- 
ing the activity of their target genes 
(31). The isolation of recombinant 
clones encoding such proteins would 
facilitate a genetic and biochemical 
analysis of their structural and func- 
tional properties. Prior to the applica- 
tion of the cloning strategy described 
below, genes encoding sequence- 
specific DNA binding proteins could 
be isolated only by screening recom- 
binant DNA libraries with antibody 
(28,49,50) or oligonucleotide probes 
(2,25,49); The latter axe generated from 
partial amino acid sequences of the 



relevant proteins. Both screening 
strategies are dependent on die avail- 
ability of substantial amounts of the 
purified protein. Even though the 
purification of sequence-specific DNA 
binding proteins has been greatly 
facilitated by the development of im- 
proved DNA-afflnity matrices (4,24, 
40), the requirement for very large 
amounts of starting material (tissue or 
cells) makes purification on a prepara- 
tive scale difficult The new strategy 
obviates purification of a sequence- 
Specific DNA binding protein for the 
purpose of isolating its gene. It simply 
requires an appropriate recombinant 
DNA library constructed for expres- 
sion in Eschericha coli and a DNA 
recognition site probe. Therefore, this 
strategy is ideally suited for isolating 
clones encoding rare regulatory 
molecules, 

CLONING STRATEGY 

The cloning strategy depends on the 
functional expression in £. coli of high 
levels of the DNA binding domain of a 
regulatory protein and a strong interac- 
tion between this domain and its recog- 
nition site. If these conditions are ful- 
filled, a recombinant clone encoding a 
sequence-specific DNA binding pro- 
tein can be detected by probing protein 
replica filters of an expression library 
with radiolabeled recognition site 
DNA. An outline of the steps involved 
in identifying and analyzing such a 
clone, using a recombinant library con- 
structed in the expression vector Igtl 1, 
is depicted in Figure 1. The initial 
Phase involves the identification of a 
recombinant clone that is specifically 
detected with the binding site DNA 
probe (X) but not with DNA probes 



Vol. 7. No. 3 (19&9\ 
10285,61 2- 08- Jan- 03 08r34Al 



that lack the given binding site or con- 
tain a mutant version of it (Y), sec 
Figure 1. Such a clone is then shown to 
encode a (J-galactosidase fusion protein 
of the expected DNA binding speci- 
ficity. This strategy is derived from tbat 
developed for the isolation of genes 
using antibodies to screen recombinant 
expression libraries (19^2^3). 

Using a r-labeled recognition site 
DfcJA probe with a specific activity of 
10 qpm/janol (ca. 1(T cprn&g), it i$ 
possible tn detect 10' 2 finoi of active 
protein in a plaque (assuming a 1:1 
stoichiometry for the protein-BNA 
complex). This detection limit repre- 
sents 1 pg of a P-galactosidase fusion 



I0t1 T library 



'onto 



^Tfimfcr proteins tram pluses 
onto AtfCutiBtsse fitter 
lSladt floo-specsc bnioo cim 

T Cn HtflT 

fwasii liter 




| Plaqirt purify 
*P*X Mp-Yf } 



1 C"** '^sca d lys»veits far 
t «^P«Cte fiNA tonoing acoviiy 



Figure L Outline of the strategy f or the 
molecular Honing of sequence^ectfic DNA 
Dmding proteins using the expression vector 
AgUl. X is a recognition site DNA probe 
whereas Y is a control DNA probe that tacks the 
given recognition site or contains a mutant ver> 

tion of Igm recombinants that are specifically 
detected with DNA probe X (XX). After plaque 
purification, the gel eJa^ropnoresia DNA bind- 
ing assay ia osed to analyze extracts of IX and 
Agti 2 (A) fysogens. Radiolabeled X^DNTA is used 
as a probe in the binding reactions. F and B refer 
to free and bound X-DNA. respectively. Reac- 
tions id lanes +Xand+Y are carried out with thg 
AX exoscta ind contain an ewesa of either un- 
hb f hd X-DNA or unlabeled Y-DN A as com- 
petitors. 

Vol 7, No. 3 (JOSO) 



protein (ca. Mr 170,000). which is an 
amount flat is well below the expected 
level of expression for such a protein in 
a plaque of the desired recombinant 
Xgtl 1 phage. In fact, ovoexpression of 
the lacZ fusion gene should result in 
the accumulation of ca. 100 pg of the 
fusion protein in a phage plaque, as- 
suming that there are l(r infected 
cells/plaque and that the fkgalac- 
tosidase fusion protein represents \% 
of the total protein mass (0.1 pg) of an 
infected cell. The sensitivity of detec- 
tion achieved with a ^P-labeled recog- 
nition site probe (see above) is com- 
rjwable to that attained with an 
I-Iabeled primary antibody (3) or a 
detection system based on a secondary 
antibody conjugated with alkaline 
phosphatase (29). A comparison of the 
signals generated by a DNA binding 
site probe and an antibody directed 
against the corresponding protein is il- 
lustrated in Figure 2. The Xgtl I recom- 
binant (KEB) encodes a p-galac- 
tosidase fusion protein that contains the 
DNA binding domain of the Epsiein- 
Barr virus nuclear antigen EBNA-1 
(38,44). A protein replica filter 
prepared from a mixed plating of XEB 
and control Xgtil recombinant phage 
was screened initially with a recogni- 
tion site DNA probe (oriP) that con- 
tains two high affinity binding sites for 
EBNA-1 and subsequently with an- 
tibodies directed against EBNA-1. In 



this case, the higher signal obtained 
with the DNA binding site probe was 
attributed to a less sensitive secondary 
antibody conjugate containing horse- 
radish peroxidase (29) used in tm- 
rauno-screenlng. Note that the patterns 
of plaques detected by the two types of 
probes are superimposable. Therefore, 
a DNA binding site probe can be used 
to detect a suitable recombinant phage 
with the same fidelity as an antibody. 



SCREENING OF EXPRESSION 
LIBRARIES 

Using screening conditions 
developed with a model system, Singh 
et al. (44) isolated a cDNA clone that 
encodes an enhancer binding protein 
(H2TF1/NFkB in Table 1); This human 
cDNA clone was detected by screening 
a Xgtl 1 expression library with a bind- 
ing site probe derived from the enhan- 
cer of a major histocompatibility com- 
plex (MHC) class I gene. 77ie 
recombinant clone successfully satis- 
fied the criteria depicted in Figure h It 
was detected only with the wad type 
MHC element (GGGGATTCCCQ 
probe but not with control DNAs that 
lack the MHC element or contained a 
mutant version. Secondly, it specified a 
p-galactosidase fusion protein which 
bound specifically to the MHC element 
in a gel mobility assay. The binding 




toSrctsuy). ^l^^r^^ 0 ^ 13 ^ 3 ^ $ffiofG. Milman. Johns Hop- 



, i / i w i yi .iiii i i| irH i ' u 

|0285612 08^:Jan-03 08:34a) 



Overview 



Genes/Genomes ~ " " ^ 

n««» s . Jt ^, That Contain Tha 

5= B,ndingSlta Binding Site References 



H2TF1/NFjcB* 

NF-A2(Oct-2) 

NF-Al(Oct-l) 

E12 
XBP 
RF-X 
YB-1 
IRF-1 
PRDI-BF 
Ptt-1 
MLTF 



GGGQATTCCCC 

ATQCAAAT 

ATGCAAAT 

GGCAGGTGG 
ND 

CCCCCTAGCAACAG 

GACTAACCGGTTT 

AAGTGA 

GAGAAGTGAAAGTG 
GATTACATGAATATTCATGA 
CACGTGACCG 



MHCI.p2,lflic,SV40,HlV. 
IL-2R, p-IFN 

lgH l lgK ( H2B,U1 l U2, 
U6, SV40 

tgH. \QK, H2B, U1, U2, 
U6. SV40 

igic 

MHC tl (Aa) 
MHC II (a-DR) 
MHC II (a-DR) 
0-IFN 
0-IFN 

Prolactin, Growth hormone 



44 
7,34.46 
47 

35 

21 

W. Relthetal., 1989 3 
9 
33 

T. Maniatis. personal comm. 
23 



CRE8 



TGACGTC 



C. CarrandP.Sharp, 
personal comm. 

J. HoeffleretaJ., 1988 1 



Adenovirus major late trans- 
cription unit, Metallothionein 
rfibrinogen 

Somatostatin, enkephalin „.,.„„,«,, Bl «, 1SW3 

in the binding site coiumn generally repre *ent onerr^berof asetof re^edrnotfsthatthectonedprot^ 

J?£Sg£5£Z^ US * Slhe PRD " *™ screen an expression library 

heftier. J.P.. T.E. Meyer. Y. Yun, J.L Jameson and J.F. Habener. 1988. Science ^1430-1433 
Reith, W., E. Barras, S. Satota, M. Koor, p. Reinhart, C. Herraro Sanchez and 8, Mach. 1989. PNAS (in press). 



site was further delineated by methyla 
tion interference analysis of ihe 
protein-DNA complex. The isolation 
of this clone validated the various as- 
sumptions on which the screening 
strategy is based. It also provided the 
impetus for its application m the isola- 
tion of other clones encoding se- 
quence-specific DNA binding proteins. 
The isolation of a clone encoding a 
lymphoid-spedfic octamer binding 
Pratein^jCNP-A2 (Oct-2) in Table I) 
(7,46) demonstrated the usefulness of 
two modifications. In this case, a multi- 
site DNA probe, consisting of four 
copies of a 26 bp oligonucleotide con- 
ning the octamer motif 
(ATGCAAAT) was employed This in* 
creased the sensitivity of detection of 
the relevant recombinant phage (sec 
below). Furthermore, in this screen, 
sonicated and denatured calf thymus 
DNA was used as a nonspecific com- 

254 BidTeehniques 



petitor instead of poly (dl-dC)>poly (dl- 
dC). This substitution reduced the 
number of inappropriate recombinant 
phage that were detected (see below). 

Vinson et al. (48) have described a 
third modification in which the nitro- 
cellulose replica filters were subjected 
to a deiuuiiration/renaturation regimen 
prior to screening. This treatment en- 
hanced the sensitivity of detection of a 
phage X recombinant encoding the en- 
hancer binding protein (C/EBP) (see 
below). This report also demonstrated 
enhancement of the detection signal 
with a multi-site DNA probe. 

In the year following the initial ap- 
plication of this strategy, a large num- 
ber of mammalian cDNA clones en- 
coding distinct sequence-specific DNA 
binding proteins have been isolated 
(Table I). All of these proteins appear 
ro represent transcription factors which 
regulate the activity of different 



promoters and enhancer elements (see 
Table 1). These examples facilitate the 
evaluation of different aspects of the 
screening strategy. 

Construction of Expression Library 

cDNA synthesis and cloning. Suc- 
cessful screening is critically depend- 
ent on the frequency with which func- 
tional recombinants (in-frame fusions 
of fee DNA binding domain with a 
bacterial protein segment) are repre- 
sented in a given cDNA expression 
library. The cDNA library should be 
made from mRNA isolated from a cell 
or tissue source with the highest levels 
of the desired DNA binding protein. 
First-strand cDNA synthesis should be 
carried out using random primers 
rather than oligo(dT) t since the DNA 
binding domain may be encoded in the 
amino- terminal part of the desired 

VOL 7, Nq 3(1989) 

[0285612: Oaj!;:Uari-:03, 08; 34A | 



protein (5' aid of the corresponding 
mRNA). Adaptors rather than tinkers 
are preferred for ligaring the cDNA in- 
sets to the vector, since they avoid 
digestion of the cDNA with a restric- 
tion enzyme ( 18,51), It should be noted 
that most commercially available 
cDNA expression libraries are con- 
structed using Eco RI linkers. Some of 
these libraries contain a high frequency 
of partial cDNA marts that are flanked 
by natural Eco RI sites, indicating inef- 
ficient protection of internal sites dur- 
ing their construction (K. LeClair, per- 



sonal comrnunication). This can result 
either in the disruption of acDNA seg- 
ment encoding a DNA binding domain 
or in a decrease of the frequency of 
recombinants containing in-frame 
fusions of fee DNA binding domain 
with the bacterial protein segment 

Expression vectors* The phage 
vector Xgtl 1 appears most suitable for 
expression screening. It offers the ad- 
vantages of high cloning efficiency, the 
expression of relatively stable (i-galac- 
tosidase fusion proteins and a simple 
means of preparing protein replica fil- 



pHES 



SB 



PHE6-XO 



PHE6-EBNA-1 




OfiX 




. . . ;¥*•, ■ 



orfP 



■ ■ ■:t^'^%'-^^ r ' it: 



^Zt!^^^! cfcn " f 1 "** 1 ^ "q^^peclfie DNA bindi^ proteins using a plas- 
mhl agression system. The recombinant censtmcte pHE6-EBNA-l and pHnixo wSc w 

tanpernurc-EMmve X repressor gene, which permit the dramo-inducible exorah» of rt» feX! 

grown on nitrocellulose fitters at 30 s C. The plana were then sWftedtD4^c^T^«^™* 
^Z^^^."*?^ of P™*^ to ninwelMQJc After prrmwbiliziuSL ft B Z 

VoL?.No.3a9S9) 



ters. Recently, a new bacteriophage X 
expression vector (XZAP) has been 
described which obviates subcloning of 
cDNA inserts into plasmid vectors for 
their analysis (42). The presence of 
rnultiple cloning sites makes possible 
the use of "forced cloning* strategies 
for expression of cDNA inserts from its 
lac promoter. Unlike Xgtll, AZAP ex- 
presses fusion proteins containing a 
small amino terminal segment of fj- 
galaccosidase. Therefore, the stability 
of XZAP encoded fusion proteins may 
be different from their counterparts en- 
coded in Xgtl 1. 

Plasmid expression vectors can also 
be used to detect clones encoding se- 
quence-specific DNA binding proteins. 
Figure 3 shows that E. coli colonies 
harboring either an EBNA-1 or bac- 
teriophage X O protein-expressing 
plasmid can be specifically detected 
using the corresponding binding site 
DNA probes. Even though phage vec- 
tors are advantageous for most cloning 
applications, plasmid vectors could be 
used to rapidly generate, screen and 
analyze recombinants encoding mutant 
DNA binding domains. • 

Preparation of Protein Replica 
Filters 

Protein replica filters suitable for 
screening with DNA recognition site 
probes are most easily prepared using a 
series of steps derived from the im- 
muno-screening protocol (22, see ac- 
companying protocol). This simple 
procedure has permitted the detection 
of many clones encoding different 
DNA binding proteins, e.g. HTTFlf 
NFkB, Ctet-2, E12, XBP, YB-1, 1RF-1, 
MLTF, CREB (see Table 1). Vinson et 
al. (48) have shown that processing 
dried nitrocellulose replica filters 
through a denaturation/renaturation 
cycle, Rising 6 M guanidine hydro- 
chloride, signficandy enhances the sig- 
nal from a Xgtl 1 recombinant encoding 
C/EBP (see accompanying protocol). 
However, it is not possible from this 
report to directly compare the sen- 
sitivity of the two protocols in detect- 
ing the QEBP phage, since with the 
former the replica filters are not dried. 
The a^natitt^tion/renaniration cycle 
may increase the detection signal by 
facilitating the correct folding of a 

B techniques 255 

|0265^^jQ^Ua^03;:Qa:34A] 



Overview 



larger fraction of the E. i&//-expressed 
protein. Alternatively, it may help to 
dissociate insoluble aggregates of the 
fusion protein that form as a conse- 
quence of overexpression. This modi- 
fied procedure has been successfully 
used to isolate clones encoding Oct-l f 
Pit-1. PRDI-BF and RF-X (see Table 
1). This modification allows the re- 
screening of die same replica filter with 
a different DN A probe by repeating the 
denaturation/renaturation cycle, since 
the second denaturation step results in 
dissociation of the DNA probe bound 
in die first screen. 

Screening of Protein Replica Filters 

Recognition site DNA probe. The 
highest affinity site among a set of re- 
lated sequences should be chosen for 
the synthesis of an oligonucleotide 
probe. It has been demonstrated that 
DNA probes containing a single recog- 
nition site can be used to isolate the 
relevant DNA binding protein clones 
(H2TF1/NFKB, XBP, YB-1 and 
MLTF, see Table 1). However, in a 
number of cases (Qct-2, Oct-l t E12, 
see Table 1), the signal was appreci- 
ably enhanced with DNA probes con- 
taining several copies of the ap- 
propriate binding site. This effect is 
demonstrated in Figure 4 with the 
recombinant phage encoding H2TF1/ 
NFkB (Xh3). In this case, the multi-site 
probe (trimer) was generated by clon- 
ing three tandem copies of a 25 bp long 
oligonucleotide containing the H2TF1/ 
NFkB binding site (GGGGAT- 
TCCCQ. When equivalent protein 
replica filters are screened with either 
fee Inner (monomer) or the 3-mer (tri- 
mer) probe (each end-labeled wife 
to fee same specific activity), the latter 
generates a 3-5 fold higher signal. 
Multi-site probes can also be prepared 
for screening simply by catenation of a 
binding site oligonucleotide with DNA 
ligase, followed by "nick translation" 
(48). Such a probe was used to isolate 
the cDN A encoding Pit- 1 (23). 

Enhancement of the signal with a 
multi-site probe may be due to the fact 
that such a probe can simultaneously 
interact with two or more immobilized 
protein molecules, thereby increasing 
the overall stability of the protein-DNA 
complexes (see below). This type of 

256 BioTechniqucs 



DNA probe is particularly suitable for 
the isolation of clones encoding DNA 
binding proteins with low affinity for 
their recognition sites. Given a number 
of examples in which a multi-site DNA 
increased the detection signal, it is 
clearly ^preferred type of probe. 

Nonspecific competitor DNA* The 
addition of an excess of nonspecific 
competitor DNA in fee probe solution 
reduces background as well as mini- 
mizes fee detection of recombinant 
phage encoding nonsequence-speciftc 
DNA binding proteins. Several dif- 
ferent competitor DNAs have been 
used to successfully screen expression 
libraries (33,44,46). Screens of such 
libraries with poly(dI-dC)'poly(dI-dC) 
as fee nonspecific competitor DNA 
yielded some recombinant phage that 
encoded proteins which preferentially 
bind single-stranded DNA (44). As 
shown in Figure 5, the signal from such 
phage (e.g., Xhl), but not from phage* 
encoding, sequence-specific DNA 
binding proteins (e.g*, Ah3) which 
encode H2TF1/NFkB), could be effi- 
ciently blocked with sonicated and 
denatured calf thymus DNA at a con- 
centration of 5 figfrnL The latter DNA 
further reduced the background signal 
from the filters. Based on the results of 
Figure 5 and given that several clones 
encoding sequence-specific DNA 
binding proteins have been successful- 
ly isolated using sonicated and dena- 



tured calf thymus DNA (eg., Oct-2, 
MLTF and E12, see Table 1), this non- 
specific competitor DNA is preferred. 

Binding and wash conditions. The 
equilibrium association constants of 
sequence-specific DNA binding pro- 
teins range over many orders of mag- 
nitude (10 s - 10 12 M* ). Consideration 
of the equilibrium and kinetic constants 
of a protein-DNA interaction in solu- 
tion suggests that successful screening 
may be restricted to proteins wife rela- 
tively high binding constants, since 
only these are likely to form complexes 
wife half-lives long enough to 
withstand the wash protocol (44), For 
example, if a regulatory protein has an 
association constant of 10 20 fvT 1 , then 
under the screening conditions (the 
DNA probe is in excess and at a con- 
centration of ca. 10" ia M), approxi- 
mately half of the active molecules on 
the filter will have DNA bound. Since 
the filters are subsequently washed for 
30 min, the fraction of protein-DNA 
complexes feat remain will be deter- 
mined by their dissociation rate con- 
stant Assuming a di ffiision-limi ted as- 
sociation rate constant of 10 7 M~ l S' ] 
(1), the dissociation rate constant in 
solution will be 10" 3 ST 1 . This rate con- 
stant translates into a half-life of ap- 
proximately 10 min. Thus, one-eighth 
of fee protein-DNA complexes should 
survive the 30 min wash. For a binding 
constant of 10* M~\ about a tenth of 



1-mer 



3-mor 



- > 



X.h3 



4» . 



* 1 



ZTJ^f^ * fiKlon P"** 1 **** «P«>fi«lly to an MHC gene resulaoiy 

^Z^r^^ U ^ Iy . Saf ^ Ce C3 " mSr) ' ^ helanwDNA ^ agift of A-Baldwio. MaauZwsn 

VcL7,No.3(19S9) 



the active protein molecules will have 
DNA bound, but virtually all of this 
signal should be lost since the half-life 
of these complexes in solution is ap- 
proximately 1 nun. It is unclear 
whether the equilibrium and kinetic 
constants of a protem-DNA interaction 
in solution accurately describe the 
binding of a DNA probe to a matrix of 
protein immobilized on a filter. Thus, it 
may be possible to isolate recom- 
binants encoding proteins with binding 
constants of 10 M" 1 or lower. The sen- 
sitivity of detection of a phage encod- 
ing a low affinity variant of the Oct-2 
protein is markedly enhanced by using 
a DNA probe containing multiple bind- 
ing sites (46). Since the association 
constants of DNA-binding regulatory 
proteins aie dependent on ionic 
strength, temperature and pH, these 
parameters can be manipulated in the 
binding and wash steps to optimize the 
detection of a relevant recombinant 



pofy (dl-dC) 



Xh1 




protein. Finally, if the DNA binding 
protein being cloned has an exogenous 
metal ion requirement (eg., Mg**X the 
binding and wash buffers should be ap- 
propriately supplemented 

CHARACTERIZATION OF 
RECOMBINANT DNA BINDING 
PROTEINS 

After the isolation of a recombinant 
phage that is specifically detected with 
a given binding site probe, but not with 
control DNAs, it is necessary to 
demonstrate that this clone encodes a 
recombinant protein of the expected 
DNA binding specificity. La the case of 
a Xgtll recombinant, this is simply 
achieved by isolating lysogenized E. 
colt clones and assaying extracts of in- 
duced lysogens for a fJ-gai&ctosidase 
fusion protein that specifically binds 
the recognition site probe used in the 



CT DNA 




\ 




screen (see accompanying protocol). 
Chemical and enzymatic footprinting 
in conjunction with the analysis of 
mutant binding sites are required to 
rigorously characterize the DNA bind- 
ing specificity of the recombinant 
protein. The criteria used to relate a 
recombinant protein cloned by this 
strategy with a previously charac- 
terized native protein are discussed in 
the foDowing section. 



VoL 7, No. 3 09S9) 



DISCUSSION 

In this article we have reviewed the 
development of a new strategy for the 
molecular cloning of sequence-specific 
DNA binding proteins- This strategy 
circumvents purification of such a 
DNA binding protein for the purpose 
of isolating its gene: It simply requires 
a cDNA Horary constructed in the 
phage Xgtl 1 and a DNA recognition 
she probe. As a result of its simplicity 
and its potential to isolate rare cDNA 
clones, mis strategy is expected to 
greatly facilitate the analysis of pro- 
teins thai regulate transcription, DNA 
replication and site-specific recom- 
bination. In fact, within a year of its in- 
troduction, more than ten cDNA clones 
that encode distinct transcriptional 
regulatory proteins have been isolated 
using mis strategy {see Table 1). 

The DNA binding domains of a 
large number of regulatory proteins 
contain either a helix-turri-helix motif 
or the "zinc finger" motif (10,1536, 
41). Clones encoding proteins with 
eit her of these structural motifs can be 
detected by in situ screening with the 
relevant recognition site DNAs, The 
protein encoded by HZTFI/NFkB 
cDNA clone contains two "zinc 
fingers" in its DNA binding domain 
(Baldwin, LeOair. Singh and Sharp, 
unpublished results). In contrast, the 
Oct-2 and Oct-1 cDNA clones encode 
proteins wtih a predicted helix-mrn- 
helix motif (734.46,47). Thus, the 
screening method appears not to be 
restricted to a sub-class of DNA bind- 
ing domains. 

Many sequence-specific DNA bind- 
ing proteins are functional homodi- 
mers. The binding sites of these 
proteins exhibit two-fold rotational 
symmetry. In these cases the affinity of 

BioTechniques 157 



[0285612 : 08> jari- 03 08134a) 



Overview 




ThB^yiafdpalrofagni host strains. E O^ri090art.y4'd»-i»«S^S75» 
Rw^a {pSa^a^. NJJ. Mtrooellulosei^ 
- -ScHacher artf Sc^l (Keene, NH). vV. v. W^X^ 3 " 0 ^ 

dOT/dGTP, dTTpto 100 pM each find ccrcefitratw^SCQ uC 
[KJ&now firaflinanfiL- "i v ■ - : - «*.^si 




3. 
4. 
5. 
6. 
7. 

10. 

11 
12. 
13. 
14. 




{KJanowfragrnartt^. 

fnclibata at roam temperature for 30 nun. 
Add dATP to a final concentration eMOOfiM. 

Continue incubation for an additional 30 mia "V^.Yv •= ' v 

Stop foerid-febaSrig reaction by adtftig E0TA b 20 mil ^■•-f.V-.'. j " :Y : ' ; 

SEaSSS^^ ^ M Mj&U* 

ResuspendJhel^peBet in 200 |d water. .-Vvw. f • . ' •" •'■V 

*M22 M lof3Ms«fa ma a^{pH7.5)airt^ . V ;//>v : : 

Visualize the labeled fragments by auJoradtegraphy (1 fnlnexp^^) <; ' '. " 

I^^tistedacnrlanudea^ 
^°rtT re ^ ^ ^ aCfiVity ^HPorated into th. proba ^^^co^^^^^^c 

ingcondiSorsbeloW). SomeDNA feS^t!^ 8 ?^.? 10 W» (see scree*- 

"neubaielSrrtaat^Ctoatowprwgea^^ cww sonny. 

to each 5000 C ' 
Spread ea* nwture ijuicHy on a prw«^^ 
toajbatethe I3plates at42« C untQ finyptaquasare vte^f^al^T 

J!*L mfl3 ^ S ^ k 132 mm n * roceUlj,0S6 rnWIf^Ste^mha^l^airdry'thBni 

Incubate the l^plases at 37' Cfar6h. ..'V i "'•"*:" ■ .'*\ 

"• Cool ungates at <« C ^^ 1 £ ••, " " 

buffer at 4° C for up to 24 h prior to ccmertngT ^ W» witti fresh brrrfrng buffer. Rffers can be stomd in tindmg^ 

BmtumHoamBnatoralbnPmtDcri ... :\\\ 

12a. lift nitrocellulose filters and air dry them for 15 mln at room temping . I • * > 

Usa 100 mlper Alters. Repeal tr^s^ 



a. 

3. 
4. 
5. 

a 

7. 

a. 

9. 
10. 



258 BicOfcohroqaes 



Vot. 7,I^>. 3(1989) 
[0285612 08-Jan-03 Q8:34a1 



^ 'CwclS ^ ^ ^ C. Repe^ ^ step and men b ,acK the Cars by !Ma ln BLOT . 
16* Immerse fifters in Hepes todmg buffer supplemented with 0.25% Carnation non-fal mUk powder for 1 min at 4= C. Scn*n filters as 



3, 



use). Use 1 50 mm petri dishes far brnding ireSsSta^ stain ™^ mS/I™ £5 ^"^^ «> «e before 
.reused w8h up tofive filter, raers^o S^n^in^SSfd a? ™* probesoWfen car, be 

S^l fi S° UrtfmeS { ^l" b , eaChWaSh ' 30 91 ro ^ te "P«^^ W«h 50 ml aEpuols of theWnefir® buffer 

^JS-S^ 9 pap,r and P e,fo " n autoradiography with a tungstatB intending screen at -70' C for 12 to 2* h 

hfonmcatkm and purification of sequence specific clones: ^ tot 12 to 24 h. 

!? late J!f plUSS corres P° n < fin 9 to PM*w signals and generate secondary phage stocks accord.no t» Mani*** * .1 ™ 

IncuhatB 15 min at 37* C. 
Add 3 ml of top agarose pre-equffibraiBd at 47* C. 

Invert the solution twice and spread on a prewarmed and dry 100 mm Lf^pfalB 

Pnx^ as described In st9p7^paration of trie nitrocellulose filter repScas" using 82 mm nitrocellulose fitter* 
For screemng 82 mm filters, use 10 ml aliqucts of binding solution in 1 cb mm petrl dishes nltmxa ^ 

SrS^ 

Dilute the saturate cell culture 1 00-fold in LB-medium supplemented with TOmMMgCte 

Pnvarattw of oudB es/f awraeto from wawittnMf pfla^e fracas 

Spina 1 1 ml aliquot of the induced culture In a microlugefor 1 min at room temperature. 
Quickly freeze the resuspanded cells in liquid n/trogen 

^thefm^n cell suspension, adjus! to 0.5 mg/mJ lysozyme and incubate tar 15 trfrTon ice. 
It Frwwttedialyzedaaiaetl/T^^ 

13 2 E Uc^S^S" **" ^ - ^ -^c, can be tested ,n various ways ^ gel ^ ^ ^ (5 ^ 



2. 

3. 

4. 

5. 

6. 
1. 
& 
9. 



2. 



5. 



2. 
3. 
4, 

5. 

a 
7. 

8. 

9. 
10. 
11. 



VoL 7, No. 3 (1989) 



BitfTechnrques 259 



|02856;i2 : vQ8fJah^03 /(KHsSaI 



Overview • 



the monomer fat the complete binding 
site is significantly lower than that of 
the dimer (37,41). Clones encoding 
such homodimcric proteins can also be 
detected by in situ screening. The bac- 
teriophage X O protein appears to bind 
its dyad symmetric recognition site in 
orik DNA as a dimer (Ro berts and Mc- 
Macken, personal communication). A 
clone encoding this protein can be 
specifically detected by in situ screen- 
ing of bacterial colonies using orik 
DNA as probe (see Figure 3). The 
mammalian protein C/EBP also ap- 
pears to require dimerizarion for se- 
quence-specific binding (Laodschulz, 
Johnson and McKnight, personal com- 
munication). A Xgtll recombinant en- 
coding this protein can be detected by 
screening plaque lifts with the cor- 
responding DNA binding site probe 
(48). Interestingly, die region of C/EBP 
required for dimerization, the 'leucine 
zipper," is shared by a number of 
regulatory proteins including GCN4, 
Fos, Myc and Jun (28). Recently, 
Murre et aL (35) have used the screen- 
ing strategy described herein to isolate 

cer binding protein (E12, Table 1) that 
requires a new type of dimerizarion 
domain for DNA binding. These ex- 
amples dearly show that clones encod- 
ing proteins that bind DNA as 
horriodimers, using different dimeriza- 
tion domains, can be successfully 
screened as a consequence of their 
functional expression in E. colL 

Most functional DNA binding 
domains, including elements required 
for dimerizarion, are contained within 
relatively small protein segments (ap- 
proximately 60-200 ammo-acids, e.g., 
toe DNA binding domains of EBNA- 
1(38), GAL-4 (26), GCN-4 (20), Spi 
(25)); therefore, successful screening is 
not dependent on full-length cDNA 
clones. It simply requires that a given 
expression library contain partial 
cDNA clones spanning the DNA bind- 
ing domain of die desired protein. 

The screening strategy, although a 
very powerful tool, has limitations. 
Since it relies on functional expression 
of a DNA binding domain in E. coli, it 
is highly unlikely to enable the cloning 
of proteins, which depend either on a 
cell-specific post-translarional modifi- 
cation or a second distinct subunit for 

260 Bkflbchniques 



high affinity DNA binding. In the case 
of heterodimeric proteins (6,16X one of 
the two subunits may bind trie recogni- 
tion ace with an affinity that makes the 
isolation of its gene by in situ serening 
feasible. For example, in the AP-1 and 
c~Fos complex, c-Fos confers high af- 
finity binding, but the AP-1 subunit 
alone binds die same recognition site 
with detectable affinity (8»17X Given 
the clone for one subunit of a hetero- 
dimeric DNA binding protein, it may 
be possible to clone the gene encoding 
the second subunit by using a variation 
of the expression screening approach 
(see below). Another limitation of this 
strategy is, initially a recombinant 
protein can only be related to a pre- 
viously identified native protein by 
comparing the DNA binding specifici- 
ties of the two. However, in a situation 
where multiple DNA binding proteins 
recognize the same sequence, this cri- 
terion is very difficult to apply (21,44). 
Eventually, direct structural analyses 
are necessary to resolve this issue. An- 
tibodies generated against the cloned 
protein permit the detection of shared 
antigenic determinants (47), Peptide 
m apping performed on analytical 
amounts of the native and cloned pro- 
teins constitutes a definitive structural 
comparison (7). A third limitation of 
this strategy is that its application can 
result in the isolation of recombinant 
phage whose p-galactosidase fusion 
proteins do not appear to bind DNA 
with detectable affinity hi solution (T. 
Kristie, personal communication). 

The strategy of cloning a gene on 
the basis of detection of its functional 
recombinant product with a ligand 
probe, has considerable potential It 
may be possible to use different types 
erf ligands, including RNA recognition 
sites, hormones, protein subunits (eg., 
a subunit of a heterodimeric DNA 
binding protein), nucleotides, metal 
ions etc., to directly clone genes that 
encode the relevant proteins. During 
the development and application of the 
strategy reviewed in this article, a few 
ligand-medialed screens of this type 
have been described. The cDNA for a 
calmodul in-binding protein has been 
cloned using iodinated calmodulin as a 
probe to screen a Agtll expression 
library (43). Similarly, XgtU clones 
expressing the regulatory subunit of a 



cAMP dependent protein kinase could 
be detected bv an in situ screen of a 
library using P-labeled cAMP as a 
probe (27). Finally, mutants of ras that 
are defective in GTP binding have been 
isolated by using ^-labeled GTP in 
an in situ colony-binding assay (11). 
Thus, the principle underlying this stra- 
tegy appears quite general in its scope. 

ACKNOWLEDGMENTS 

The experiments described in this 
overview were carried out in the 
Laboratory of Dr. Phillip A. Sharp. We 
gratefully acknowledge Phillip Sharp 
for many stimulating discussions and 
for his critical comments on this 
manuscript We also thank our col- 
leagues at the Center for Cancer Re- 
search for many helpful suggestions, in 
particular A. Baldwin, M Brown, M. 
Garcia-Blanco, M. Griot, T. Hayes, T. 
Kristie and K. LeCiair. 

RS. acknowledges postdoctoral 
support from the Jane Coffin Quids 
Fund, R.G.C. is the recipient of a CSba- 
Geigy, Basel (Switzerland) fellowship 
and a Schweizerischer National Funds 
stipend J JUL is a Special Fellow of 
the Leukemia Society of America. 

REFERENCES 

i-Berg, O.G«, R.B. Winter and P.H. von Hip. 
pet 1982. How do genome-regulatory proteins 
locaic their DNA target sites? Trends in 
BiocheiaScL 7:52-55. 

XBodner, &L, JX. CastriOo, L.E. Thrill, T. 
Deertnck, M. EUismajiand M. Karin. 1988. 
The pituitary -specific transcription factor 
GHF-1 is a homeo box containing protein. Cell 
15:505-518. 

33roomc 5. and W. Gilbert. 1978. Im- 
munological screening method to detect 
specific translation products* Pmc. Nail Acad 
Sci USA 75:27-46-2749. 

4 Chodosh, hJL, R.W. Carthew and P.A, 
Sharp. 1986. A single polypeptide possesses 
the bonding sod transcription activities of the 
Adenovirus major late transcription facsoc 
MoL CcH BioL (5:4723-4733. 

5-Chodosh, UA. 1988. Mobility shift DNA- 
batdmg assay using gel electrophoresis, 12£. 
InM. Ausuhel, R. Brent. R£, Kingston, D JX 
Moore, J.G. Seidman, JA. Smith and K. Smihl 
(Eds.). Oxneat Protocols in Molecular Biol- 
ogy. John Wiley and Sons, New Yoifc, NY. 
G-Cfaodosli, LA, A-S. Baldwin, R-W. Carthew 
and P-A. Sharp. 1988. Human CCAAT-oind- 
ing pnxeim have heterologous wbuflin. Cell 
5*11-24. 

7.Cferc, R_C, LM. Corcoran, JJJ. LeBowttz, 
D, Baltimore and P.A. Sharp. 1988. The B- 

VoL 7,No.3<I989) 



> 
> 
> 
> 
> 



ceil specific Oct-2 protein contains Pbu box 
and bomeo to-type domains. Genes and 
Devdopment2:!570-I381. 

8. Cmrnm, T. and BJL Praam 1988,' Rk and 
JwcThc AP-1 carnation. Celi55;395-397. 

9. D«to F DA, J. SchifTenbaiier, SA. Woulfe. 

Z * d ^ b , * pd Schwartz. 1988. 

Oiardctaizatba of the cDNA encoding a 
protein binding to the major histocompatibility 

TOmptetchssIIYbox.Proc.NaiLAcadScL 

USA SS:7322-7326\ 
lO^vsms, RM. and SJVL KoJknbtrg. 1988 

Zinc fingers: Gaili by association. Cell 52- 1-3 
1 2 Jelg, t^, B.T. Pan, T.M. Roberta and GM. 

Cooper* 1986. Isolation of rai GTP-biiiding 

mutants using an j^cotony-bindmg assay 

IZMed, M and D. Crothers. 1981. Equl- 
• U ^?* I and kineUcs of toiq)re$soT-operator 
interactions by oolyaaybmtde gel dectro- 
pboresifi. Nik. Adds Res. 9*505-6525 

1 3. Galas, D. and A. Schmltz. 1978. DNase fool- 
printing: A simple method for the detection of 
^man^A binding specificity. Nuc Add* 
Kes> ,Jy 157-3 170. 

14. Garner, M. and A. Reran. 1981. A gel 
fck*2rophoresU method for quantifying the 
binding of proteins to specific DNA regions: 
Applications to components of the E. coii lac- 
tase operon regulator system. Nuc. Acids Res. 
9*J047-306U 

15. Gehr2ng, WJ. 1987. Horneo boxes in the 
***Y of development Science 2 S6 '.1245- 

16Bahii t S.aiMlL-X2»a«ntfi. 1988. Yeast Hap2 
^Hap5 - Transcriptional activators in a 
«««wtinenccoOTpiex. Science 24&317-321 

Uifatoonetis, TJX, K. Gtorgopoulos, M.E.' 
Greenbtrg and P. Leder. 1988. c _ Jun 
dimerizes with itself and with c-F6s, forming 
^^j^^ferant DNA binding affinities. 

28^aymerle, EL, >. Hers, GJH Bressan, R. 
F*u«and I KJK.Stantey. 1986. Efffcientcou- 
strucoonof cDNA libraries in plasmid expres- 
! l °L V ? aofS ^"^ 811 adaptor strategy. Nuc. 
Acids Res. 27:8615-8624. 

I9 "^ J R -Fer*misca, J.C. Fiddes, 

^Tbomasand SJL Hoghes. 1983. Iden- 
tification of clones that encode chtcken 
tropomyosLn by direct immunological screen- 
ing of a cDNA expression library. Proc. NatL 
Acad Scl USA i -35. 

20JIop*, J.A. and K. StrahL 1986. Functional 
dissection of a eukaryotic transcrijraprral acri- 

U and LM. 

OTmcfer. 1988. DUrinct cloned da&s Q MHC 
UNA bndm 8 proteins recognize the X box 
ttHflscnpt ion element Science 242 :69-7 1 

fo^ n V V - ^ Davis* 
Ar™\ c t on 5 roctlCm 81x1 screening cDNA 
Kbranes in igilOand Xgtl U p. 49-78, /* D M 
Gtoverm), dn A Ckming, Vol. I: AftaS 
cal Approach. IRL Press, Oxfkatt 

^minons, L. S wanson an d M.G. Rosecfcid. 
2i Q *ra-*pedfi* tnu»criptk*i factor 
c«Hauimg a home* domain specifies a 
pjtuiiary phenotype. Cell SS'^ 19-529. 
244Cadonaga, J.T. and R. Tjiaru l9&dAfSnj ly 



VoL 7, No. 3 (1989) 



purrficanmi of sequence^pecinc DNA bind- 
ing proteins. Proc Nad. Acad. Sci. USA 
$*5889-5893. 
^SXwteaga, J.T, K.R. Camer, FJt Maaiarz 
and R. Tjitn. 19b7. bolatroo ofcDNA encod- 
uig transcription £sctor Spl and fucccional 
analysis of the DNA binding domain. Cell 
57:1079-1090. 
^Keegar^ L^ G. GtU and M. Plastma. 1986\ 
Separation ofDNA binding from the transcrip- 
tKm-activatmg function of a cukaryatic 
regutaory protein. Science 23! .-699-704 
27^cojnbe,MJL, a Ladant, R. Mntrel and 
M. Vetttn. 1 987. Gene isoterioo by direct in 
situ cAMP binding, Gene J&29-36, 

u^T ^ GrmV€S ^ SX. McKnight 
1988. Isolation of a recombinant copy of the 
gene encoding C/EBP, Genes and Develop- 
mem2:786-80a 

291 £E y * ^ ^ BrigatI ^ D C - Ward. 
1983. Rapid and sensitrvecctoimetric method 
for visualizing tectin-labefcd DNA probes 
hybridized to DNA or RNA immobaitKi on 
nitrocellulose: Bio-blots. Proc. NatL Acad 
Sci USA 30:4045-4049. 
3a Maniatis,T,E.R FHtsch and X Sam brook. 
it Mojejcuhr Going: A laboratory 
Manual Cold Spring Harbor Uborarory Press\ • 
Cold Spring Harbor, NY. 
31-Manialis, S. Goodbourn and IA. fis- 
cher. 1987. Regulation of inducible and tissue 
specific gene expression. Science 2J&1237- 
1245. 

3ZMaman, G„ Ai. Scott, MS. Cho, S.C 
^n«, DJC Ades, GS. Hayward, P.F. 
Si, J.T. August and SJO. Hayward. 1985 
Carboxyl-tcrminat domain of the Epsrein-Barr 
vims nuclear antigen is highly immunogenics 
man. Proc. NatL Acad. ScL USA «2;6300- 

33JWrjamoiQ, T. Fujila, Y. Ktrairra, M. 
fcferoya ma, EL Harada, Y. Sudo T T. JVfiyata 
and T TaniguchL 1988. Regulated expres- 
^cn of a gene encoding a nuclear fac»r, IRF- 
1, that specificafly binds to IFN-fi tene 
regulatory elemems. Cell 5^903^913. 
MMW^MAL, S. Ruppert, W. SchaOner 
and P. Matthias. 1988. A cloned octamer 
teansenptiofl factor stimulates transcrmtion 
from lymphoid specific promoters in non-B 
cdb. Nature S36i$i4-SsC 
35^urre, C, P. SchonJeber-WeCaw and D. 
Baltimore. 1989. A new DNA binding and 
dtmenzation motif in inimmioglobuim enhan- 
cer binding, daughterly MyoD and mvc 
proteins. Cell, in press, 
36 f^ C >°* and R.T. Saner. 1 984. fVoteir. 
t^A recogniricn. Ann, Rev. Biocfcem. 53- 
293-321. .. jRjw*-^ 
37^^ M. 1986. A Genetic Switch. Cell 
Pjw iand Btecfcwefl Scientific Ptibllcaiions, 
^aradndgje, MA. 

38 - Ra ^ n$ * D ^ G. Milman, S.D. Hayward 

DNA binding of me Epstcm-Barr virus nudear 
antigen (EBNA-1 ) toclnsteredsirca in the plas- 
n^matmenance region. CeU ^2:859^68, 
39 '™^ XD - and IWcMacKen. 1983. The 
tocoriophage X O repheation protein: Isola- 
<*aracterizatiQn or the amplified in- 
itatw. Nuc. Adds Res. 7/;7435^745Z 



^JWeld, PJ. and TJ. Kelly. 1986. Pinif. 
cmWnacfear fector r by DNA recognition 

41Jehlci^ 1988. binding by proteins 
Sconce 24/;1182-l 187. 

^^M 1 * 1, JJS1 F « rnAfl ^ X^- Sorgeand 

WJK Huse. 1988. XZAP: A bacteriophage k 
expression vector with mvjC7> excision proper- 
tics. Nuc. Adds Res. 76t7583-760a 
43-Sflc*^ JJH. and W.B.HahiL 1987. Screening 
an o^ression library with a Hgand probe: 
Isolation and sequence of a cDNA oxnespond- 
mg to a brain caimcdidin^jiruling protein. 
Proc Natl Acar± Sci. USA 84:30&-3toZ 

m LtBawitz, AS. Baldwin and 
^5 harp. 1983. Molecular cloning of an en« 



-- — — 0 uviouuu oy screening 
« an eyresston fibrary with a recognition site 
DNA-CeB52:415^23. 
45^, a 1988. Detection, purification and 
cWferrarion of cDNA clones encoding 
^-bindingproieif^ 12.7./nM. Aasubel^ 
Brent, RX. Kingston, T>S>. Moore, J.G. Scid^ 
man, JA. Smith and K Scmhl (Eds,). Current 
ftwrxols in Molecular Biology. John Wiley 
and Sons, New York. NY 

• * 3 ^»^£ a Cl«*e, H. Singh, JM, Le- 
5f w !^ Sharp and D. Baltimore. J988. 
ponmg of a lymphoid specific cDNA enccd- 

™1 P ^ 0 b . icdiDg ^ «g"fatory octamer 
DNA motif Science 241 

475torm, KLA^ G- Das and W. Herr. 1988. the 

ubimiitous octamer protein Oct-1 contains a 

Pou domain wim a bomeo subdomain. Genes 

and Develpmect, in press. 

48 "S?? Johnson, 
VVJL Landschnlz and SX. McKnight 1988 
In jwtt detection of sequence-specific DNA 
. Wndmg activity specified by a recombinant 
J^^^ ^ and Development 

49. Walter t P^ S, Green, G, Green, A- KrnsL J - 
M. Bomert,J.-M.Jelfacn, A.Staub, JLJcn- 
sen, G. Scrace, M. Waterfield and P. Cham* 
t»D. 1985. Cloning of the human estrogen 

50. Wei n^rgcr, C S^. Hollenberg, fe^S. Ong, 
JM. Harmon, S.T. Brow, X QdlowsW, 
E3. Tkompson, M.G. RosenfeW, RJW. 
Evans. 1985. Idcnrincatmn of human gluco- 
c^cotdi^iOTcompIcrnentaryDN^ 
by epitope selectioa Science 226Y740-742 

T - and A. Ray. 1 987. Adaptors, 

mCthylatk>n - MeflL Efl2 V^ 

5ZYooBaiLA.andR.W.I>avi s . 1 983. Efficient 
Ration of genes by using antibody probes, 
ftne. Nad. Acad. ScL USA&k 1194-1198. 
^.Young, RjL and R.W. Davis. 1983. Yeast 
RNA polymerase K genes: Isolation with an- 
tibody probes. Science 222:778-782. 

Address correspondence to: 
Hannder Singh, PhJX 
/f ffwoni Hughes Medical Irtsskuie and 
Dept. of Molecular Gexetict A Cell Biology 
University of Chicago 
Chicago. IL 60637 



BidTcchruques 261 



